前面介紹了可觀測性,介紹了目前開源的可觀測性工具Opentelemetry
到最後才發現,怎麼還沒有提到關於Opentelemetry在Kubernetes的內容,我只能說小弟不才很難消化XD
自己在看上面這些東西的時候,也是花了幾個月的時間,不斷地try,從try裡面找知識。我認識厲害的人都是先看懂大概再try,我都是邊try邊從try中消化。不然概念都很模糊,沒有親自做看看,不會懂什麼是可觀測性,不知道在微服務中為什麼需要這個東西。
像我自己在Kubernetes中最後只有部署Operator、Collecot、Instrumentation。
部署Operator是為了能夠讓我們,在Kubernetes中部署已定義好的CRD(CustomResourceDefinition)kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
這邊小提醒,官方有一行字有提醒說要先安裝cert-manager哦
To install the operator in an existing cluster, make sure you have cert-manager installed and run:
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.6.3/cert-manager.yaml
千萬不要像我一樣英文不好看到指令就先執行了XD
安裝完Operator就可以在Kuberentes中部署OpenTelemetryCollector的資源了
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otelcol
spec:
mode: daemonset
config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
cors:
allowed_origins:
- "http://*"
- "https://*"
processors:
memory_limiter:
check_interval: 1s
limit_percentage: 75
spike_limit_percentage: 15
batch:
send_batch_size: 10000
timeout: 10s
exporters:
# NOTE: Prior to v0.86.0 use `logging` instead of `debug`.
debug:
otlp/jaeger/A:
endpoint: "jaeger-collector.xxx.svc.cluster.local:14250"
tls:
insecure: true
otlp/jaeger/B:
endpoint: "jaeger-collector.xxx.svc.cluster.local:4317"
tls:
insecure: true
otlp/python/C:
endpoint: "xx.xx.x.xxx:8003"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger/A, otlp/jaeger/B, otlp/python/C, debug]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [debug]
logs:
receivers: [otlp]
processors: [batch]
exporters: [debug]
部署完Collector就能部署Instrumentation
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: xxx-instrumentation
spec:
exporter:
endpoint: http://otelcol-collector.xxx.svc.cluster.local:4317
endpoint: http://otelcol-collector.xxx.svc.cluster.local:4318
propagators:
- tracecontext
- baggage
- b3
python:
image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
env:
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
value: 'true'
sampler:
type: parentbased_traceidratio
argument: "1"
今天以我的微服務為例
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
namespace: xxx
labels:
app: test
spec:
replicas: 1
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
annotations:
instrumentation.opentelemetry.io/inject-python: "true"
spec:
containers:
- name: test
image: lgcat/exp_test:v1
imagePullPolicy: Always
ports:
- containerPort: 8000
env:
- name: REDIS_SERVER
value: "redis://redis.xxx.svc.cluster.local:6379"
- name: USER_SERVICE
value: http://test1-service.xxx.svc.cluster.local:8001
- name: OTLP_GRPC_ENDPOINT
value: http://otelcol-collector.xxx.svc.cluster.local:4317
resources:
requests:
cpu: "50m"
memory: "64Mi"
imagePullSecrets:
- name: regcred
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 10.20.1.231
- 10.20.1.233
---
apiVersion: v1
kind: Service
metadata:
name: test
namespace: xxx
labels:
app: test-service
spec:
selector:
app: test
ports:
- protocol: TCP
port: 8000
targetPort: 8004
nodePort: 8000
type: NodePort
部署完之後我們可以describe微服務看看,你會發現跟一般部署會有一點不一樣,你會發現多一個Init Containers,如下
Init Containers:
opentelemetry-auto-instrumentation-python:
Container ID: docker://30549f52f5d417cbc14a5f4061c163f7361bbcbdfcd4d7e80b1f7ae38fdee8b9
Image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
Image ID: docker-pullable://ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python@sha256:6cc9c9f0a0f320f9585c3869e62b960c8ce5f92751b07ca22372805c62e987b0
Port: <none>
Host Port: <none>
Command:
cp
-r
/autoinstrumentation/.
/otel-auto-instrumentation-python
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sat, 28 Sep 2024 06:11:24 +0000
Finished: Sat, 28 Sep 2024 06:11:31 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 500m
memory: 32Mi
Requests:
cpu: 50m
memory: 32Mi
Environment: <none>
Mounts:
/otel-auto-instrumentation-python from opentelemetry-auto-instrumentation-python (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rfzm7 (ro)
Containers:
test:
Container ID: docker://c23f31148de65dc6186cbbcfcdb95643482c90817d9aa57dbb324dcc198d5405
Image: lgcat/exp_test:v1
Image ID: docker-pullable://lgcat/exp_test@sha256:fccbd8e3c66986c429afba3be6b61963c5c523cbfe91edb3e47459643afecacb
Port: 8000/TCP
Host Port: 0/TCP
State: Running
Started: Sat, 28 Sep 2024 06:11:39 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 50m
memory: 64Mi
Environment:
OTEL_NODE_IP: (v1:status.hostIP)
OTEL_POD_IP: (v1:status.podIP)
REDIS_SERVER: redis://redis.xxx.svc.cluster.local:6379
USER_SERVICE: http://test1-service.xxx.svc.cluster.local:8001
OTLP_GRPC_ENDPOINT: http://otelcol-collector.xxx.svc.cluster.local:4317
OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED: true
PYTHONPATH: /otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation:/otel-auto-instrumentation-python
OTEL_TRACES_EXPORTER: otlp
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL: http/protobuf
OTEL_METRICS_EXPORTER: otlp
OTEL_EXPORTER_OTLP_METRICS_PROTOCOL: http/protobuf
OTEL_SERVICE_NAME: test
OTEL_EXPORTER_OTLP_ENDPOINT: http://otelcol-collector.xxx.svc.cluster.local:4318
OTEL_RESOURCE_ATTRIBUTES_POD_NAME: test-74d786f79f-xm8kx (v1:metadata.name)
OTEL_RESOURCE_ATTRIBUTES_NODE_NAME: (v1:spec.nodeName)
OTEL_PROPAGATORS: tracecontext,baggage,b3
OTEL_TRACES_SAMPLER: parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG: 1
OTEL_RESOURCE_ATTRIBUTES: k8s.container.name=test,k8s.deployment.name=test,k8s.namespace.name=xxx,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.replicaset.name=test-74d786f79f,service.instance.id=xxx.$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME).test,service.version=v1
Mounts:
/otel-auto-instrumentation-python from opentelemetry-auto-instrumentation-python (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rfzm7 (ro)
從上面的訊息我們可以看得出來Init Container 主要的任務是將 OpenTelemetry 的自動化監控工具從容器內的 /autoinstrumentation/ 目錄複製到主容器可以訪問的指定目錄 /otel-auto-instrumentation-python。這個複製動作是通過 cp -r /autoinstrumentation/. /otel-auto-instrumentation-python 命令來完成的。
Init Container 和主容器之間共享了 /otel-auto-instrumentation-python 這個掛載點。這代表當 Init Container 完成複製工作後,主容器將能夠在其運行過程中訪問這些已安裝的 OpenTelemetry 自動化監控工具。
主容器啟動後,會透過這個掛載的目錄來訪問 OpenTelemetry 的工具,並自動開始收集跟蹤數據(traces)、日誌和指標。
所以透過部署Operator、Collecotor、Instrumentation,就可以簡單做到,雖然看似簡單,但這都是我們在Kubernetes上完成可觀測性的很大一步,有了Instrumentation可以產生遙測資料,export到Collector,在根據Collector的exporter傳送到後續可視化工具之中。